Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable PUT checksums #320

Merged
merged 1 commit into from
Jun 29, 2023
Merged

Enable PUT checksums #320

merged 1 commit into from
Jun 29, 2023

Conversation

passaro
Copy link
Contributor

@passaro passaro commented Jun 28, 2023

Description of change

Enable Crc32c checksums when uploading objects. PUT requests now have an option to enable trailing checksums.

Does this change impact existing behavior?

N/A


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

@passaro passaro temporarily deployed to PR integration tests June 28, 2023 13:07 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 28, 2023 13:07 — with GitHub Actions Inactive
@passaro passaro requested a review from monthonk June 28, 2023 13:07
@passaro passaro temporarily deployed to PR integration tests June 28, 2023 13:07 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 28, 2023 13:07 — with GitHub Actions Inactive
#[tokio::test]
async fn test_put_checksums() {
const PART_SIZE: usize = 5 * 1024 * 1024;
let (bucket, prefix) = get_test_bucket_and_prefix("test_put_object_abort");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let (bucket, prefix) = get_test_bucket_and_prefix("test_put_object_abort");
let (bucket, prefix) = get_test_bucket_and_prefix("test_put_checksums");

Comment on lines 254 to 266
impl Default for PutObjectParams {
fn default() -> Self {
Self {
trailing_checksums: true,
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be on by default here -- as a generic S3 client it should be opt-in. But we should turn it on by default in mountpoint_s3::Uploader.

#[non_exhaustive]
pub struct PutObjectParams {}
pub struct PutObjectParams {
pub trailing_checksums: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a doc comment (specifically that it's crc32c).

Copy link
Member

@jamesbornholt jamesbornholt Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe this should be pub checksum_config: Option<ChecksumConfig> and we re-export mountpoint_s3_crt::ChecksumConfig from this crate, so that we don't end up duplicating stuff if we add more options here? I don't feel strongly about it either way, especially since the CRT combines PUT and GET checksum config into the same struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, but it's unclear to me whether we want to expose ChecksumConfig in the long run (vs something tailored to specific requests: PutObject, MPU, GetObject).
I'd rather keep it as a simple bool for now and think about a more complete API when/if we'll introduce more options.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably second having an enum/struct over boolean. Something like Option<ChecksumAlgorithm>, even if ChecksumAlgorithm will only be CRC32C today?

I'm thinking the API we're putting here matches up to this one in Java SDK: https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/PutObjectRequest.Builder.html#checksumAlgorithm(software.amazon.awssdk.services.s3.model.ChecksumAlgorithm)

We don't have to re-export the CRT config struct, we could just use our own in mountpoint-s3-client if it makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dannycjones, we could start adding a ChecksumAlgorithm enum, but what about the location (trailer vs header)? That's part of ChecksumConfig but (I believe) only trailer is supported on multi-part uploads (which is what put_object uses). Do we want to limit the API to what a specific request allows, or just error on misconfiguration? Related: will we have a separate put_object for non-mpu? Or maybe that would also be configurable?

I don't think we need to answer any of these questions now, so my preference is for the simplest possible option, knowing we will replace it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably prefer to keep it simple for now, so I'm ok with using boolean. Maybe we open a new issue to review this configuration before we publish a new version of mountpoint-s3-client? I think that's the point where we say there will be no breaking change after this and you can safely use this new config.

}

/// Get out the inner pointer to the checksum config
pub fn to_inner_ptr(&self) -> *const aws_s3_checksum_config {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn to_inner_ptr(&self) -> *const aws_s3_checksum_config {
pub(crate) fn to_inner_ptr(&self) -> *const aws_s3_checksum_config {

Comment on lines 859 to 862
location: aws_s3_checksum_location::AWS_SCL_TRAILER,
checksum_algorithm: aws_s3_checksum_algorithm::AWS_SCA_CRC32C,
validate_response_checksum: false,
validate_checksum_algorithms: std::ptr::null_mut(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think?

Suggested change
location: aws_s3_checksum_location::AWS_SCL_TRAILER,
checksum_algorithm: aws_s3_checksum_algorithm::AWS_SCA_CRC32C,
validate_response_checksum: false,
validate_checksum_algorithms: std::ptr::null_mut(),
location: aws_s3_checksum_location::AWS_SCL_TRAILER,
checksum_algorithm: aws_s3_checksum_algorithm::AWS_SCA_CRC32C,
..Default::default()

}

impl ChecksumConfig {
/// Create a new [ChecksumConfig] with Crc32c traling checksums.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Create a new [ChecksumConfig] with Crc32c traling checksums.
/// Create a new [ChecksumConfig] with Crc32c trailing checksums.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, note that this is only for PUT. It's weird (but understandable) that the CRT combines checksums for PUT and GET into this one struct, but here we are.

@passaro passaro temporarily deployed to PR integration tests June 29, 2023 08:17 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 08:17 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 08:17 — with GitHub Actions Inactive
@passaro passaro had a problem deploying to PR integration tests June 29, 2023 08:17 — with GitHub Actions Failure
@passaro passaro had a problem deploying to PR integration tests June 29, 2023 09:25 — with GitHub Actions Failure
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 09:25 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 09:25 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 09:25 — with GitHub Actions Inactive
@passaro passaro had a problem deploying to PR integration tests June 29, 2023 12:25 — with GitHub Actions Failure
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 12:25 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 12:25 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 12:25 — with GitHub Actions Inactive
@passaro passaro had a problem deploying to PR integration tests June 29, 2023 14:03 — with GitHub Actions Failure
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:03 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:03 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:03 — with GitHub Actions Inactive
Signed-off-by: Alessandro Passaro <[email protected]>
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:15 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:15 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:15 — with GitHub Actions Inactive
@passaro passaro temporarily deployed to PR integration tests June 29, 2023 14:15 — with GitHub Actions Inactive
@jamesbornholt jamesbornholt merged commit 5c1c831 into awslabs:main Jun 29, 2023
@passaro passaro deleted the put-checksums branch June 29, 2023 15:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants